Improving Multilabel Classification by Avoiding Implicit Negativity with Incomplete Data

نویسندگان

  • Derrall Heath
  • Dan Ventura
چکیده

Many real world problems require multi-label classification, in which each training instance is associated with a set of labels. There are many existing learning algorithms for multi-label classification; however, these algorithms assume implicit negativity, where missing labels in the training data are automatically assumed to be negative. Additionally, many of the existing algorithms do not handle incremental learning in which new labels could be encountered later in the learning process. A novel multi-label adaptation of the backpropagation algorithm is proposed that does not assume implicit negativity. In addition, this algorithm can, using a naı̈ve Bayesian approach, infer missing labels in the training data. This algorithm can also be trained incrementally as it dynamically considers new labels. This solution is compared with existing multi-label algorithms using data sets from multiple domains and the performance is measured with standard multi-label evaluation metrics. It is shown that our algorithm improves classification performance for all metrics by an overall average of 7.4% when at least 40% of the labels are missing from the training data, and improves by 18.4% when at least 90% of the labels are missing.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Type Prediction in Noisy RDF Knowledge Bases Using Hierarchical Multilabel Classification with Graph and Latent Features

Semantic Web knowledge bases, in particular large cross-domain data, are often noisy, incorrect, and incomplete with respect to type information. This incompleteness can be reduced, as previous work shows, with automatic type prediction methods. Most knowledge bases contain an ontology defining a type hierarchy, and, in general, entities are allowed to have multiple types (classes of an instanc...

متن کامل

Efficient decomposition-based multiclass and multilabel classification

Decomposition-based methods are widely used for multiclass and multilabel classification. These approaches transform or reduce the original task to a set of smaller possibly simpler problems and allow thereby often to utilize many established learning algorithms, which are not amenable to the original task. Even for directly applicable learning algorithms, the combination with a decomposition-s...

متن کامل

MLSLR: Multilabel Learning via Sparse Logistic Regression

Multilabel learning, an emerging topic in machine learning, has received increasing attention in recent years. However, how to effectively tackle high-dimensional multilabel data, which are ubiquitous in real-world applications, is still an open issue in multilabel learning. Although many efforts have been made in variable selection for traditional data, little work concerns variable selection ...

متن کامل

Graded Multilabel Classification: The Ordinal Case

We propose a generalization of multilabel classification that we refer to as graded multilabel classification. The key idea is that, instead of requesting a yes-no answer to the question of class membership or, say, relevance of a class label for an instance, we allow for a graded membership of an instance, measured on an ordinal scale of membership degrees. This extension is motivated by pract...

متن کامل

Adapting non-hierarchical multilabel classification methods for hierarchical multilabel classification

In most classification problems, a classifier assigns a single class to each instance and the classes form a flat (non-hierarchical) structure, without superclasses or subclasses. In hierarchical multilabel classification problems, the classes are hierarchically structured, with superclasses and subclasses, and instances can be simultaneously assigned to two or more classes at the same hierarch...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Computational Intelligence

دوره 30  شماره 

صفحات  -

تاریخ انتشار 2014